Neural Networks - Bank Churn Prediction

Background and Context

Businesses like banks that provide service have to worry about the problem of 'Churn' i.e. customers leaving and joining another service provider. It is important to understand which aspects of the service influence a customer's decision in this regard. Management can concentrate efforts on the improvement of service, keeping in mind these priorities.

Objective

Given a Bank customer, build a neural network-based classifier that can determine whether they will leave or not in the next 6 months.

Data Description

The case study is from an open-source dataset from Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as CustomerId, CreditScore, Geography, Gender, Age, Tenure, Balance, etc.

Data Dictionary

Importing data

Overview of the dataset

Univariate Analysis

Pandas-profiling Report Review

RowNumber: Not useful, will be dropped

CustomerId: Not useful, will be dropped

Surname: Not useful, will be dropped

Age: Distribution is relatively skewed with a measure of 1.01. Avg. is 38.9 years, and median is 37 years. Only about 0% missing, inner quartile range is about 12 years. CV is about 26%.

CreditScore: Distribution is slightly left skewed with a measure of -.07. Avg. is 652, and median is 650 years. 0% missing, inner quartile range is about 134 points. CV is about 14.9%.

Geography: France is the most frequent geography, with Germany and Spain close in terms of count. Will OHE.

Gender: 55% male, 45% female. No missing value issue, will OHE.

Tenure: Dstribution isn't skewed. Avg. is 5.0 years, and median is 5 years. About 4% take a value of 0, inner quartile range is about 4 years. CV is about 57%.

Balance: Distribution is relatively skewed with a measure of -.14, though this is due to the large number of 0 values. Avg. is 76485, and median is 97198. About 36% are 0, inner quartile range is about 127644. CV is about 82%.

NumOfProducts: Most customers have either 1 or 2 products. No issues with 0s or missing values.

HasCrCard: 70% have a card while 30% don't. No issues with 0s or missing values.

IsActiveMember: 51% have a card while 49% don't. No issues with 0s or missing values.

EstimatedSalary: Distribution is relatively flat. Avg. is 100k, and median is 100k. No missing value issue, inner quartile range is about 100k. CV is about 57%.

Exited: Target variable. 20% conversion.

OHC some vars

Drop some vars

Let's check the missing values

Let's Explore the data

Checking correlation between features and the likelihood of the transaction to be fraud on the unbalanced dataset

Separating response variable and predictors

Splitting the Data into train and test set

Model Building

Decision Tree

Model evaluation criterion

Model can make wrong predictions as:

Which case is more important?

How to reduce this loss i.e need to reduce False Negative?

Let's now explore Neural Network models

Data Pre-processing

Start with standardizing columns

Standardize the data

Deep neural network

Model-1

Dropout

Dropout is a regularization technique for neural network models proposed by Srivastava, et al. in their 2014 paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Dropout is a technique where randomly selected neurons are ignored during training. They are “dropped-out” randomly.

Creating a model

Keras model object can be created with Sequential class

At the outset, the model is empty per se. It is completed by adding additional layers and compilation

Adding layers [layers and activations]

Keras layers can be added to the model

Adding layers are like stacking lego blocks one by one

It should be noted that as this is a classification problem, sigmoid layer (softmax for multi-class problems) should be added

Model compile [optimizers and loss functions]

Keras model should be "compiled" prior to training

Types of loss (function) and optimizer should be designated

Let's print the summary of the model

Training [Forward pass and Backpropagation]

Training the model

Plotting the train and test loss

Evaluation

Keras model can be evaluated with evaluate() function

Evaluation results are contained in a list

NN Again - Optimize on AUC

Conclusion

When focusing on recall as the target metric the Decision Tree model gives us better results as compared to the Neural Network model which has really poor performance in terms of false positives. However, when focusing on AUC-ROC as in the second iteration of the Neural Network, we actually end up with a much more balanced model. The final model has not only improved over the decision tree from the AUC-ROC stat, but recall as well.